177 research outputs found

    Multi-Modal 3D Object Detection in Autonomous Driving: a Survey

    Full text link
    In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors keep increasing, combining them for better perception is becoming a natural trend. So far, there has been no indepth review that focuses on multi-sensor fusion based perception. To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs. In this survey, we first introduce the background of popular sensors for autonomous cars, including their common data representations as well as object detection networks developed for each type of sensor data. Next, we discuss some popular datasets for multi-modal 3D object detection, with a special focus on the sensor data included in each dataset. Then we present in-depth reviews of recent multi-modal 3D detection networks by considering the following three aspects of the fusion: fusion location, fusion data representation, and fusion granularity. After a detailed review, we discuss open challenges and point out possible solutions. We hope that our detailed review can help researchers to embark investigations in the area of multi-modal 3D object detection

    OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

    Full text link
    Visual Odometry (VO) plays a pivotal role in autonomous systems, with a principal challenge being the lack of depth information in camera images. This paper introduces OCC-VO, a novel framework that capitalizes on recent advances in deep learning to transform 2D camera images into 3D semantic occupancy, thereby circumventing the traditional need for concurrent estimation of ego poses and landmark locations. Within this framework, we utilize the TPV-Former to convert surround view cameras' images into 3D semantic occupancy. Addressing the challenges presented by this transformation, we have specifically tailored a pose estimation and mapping algorithm that incorporates Semantic Label Filter, Dynamic Object Filter, and finally, utilizes Voxel PFilter for maintaining a consistent global semantic map. Evaluations on the Occ3D-nuScenes not only showcase a 20.6% improvement in Success Ratio and a 29.6% enhancement in trajectory accuracy against ORB-SLAM3, but also emphasize our ability to construct a comprehensive map. Our implementation is open-sourced and available at: https://github.com/USTCLH/OCC-VO.Comment: 7pages, 3 figure

    Mga Modulates Bmpr1a Activity by Antagonizing Bs69 in Zebrafish

    Get PDF
    MAX giant associated protein (MGA) is a dual transcriptional factor containing both T-box and bHLHzip DNA binding domains. In vitro studies have shown that MGA functions as a transcriptional repressor or activator to regulate transcription of promotors containing either E-box or T-box binding sites. BS69 (ZMYND11), a multidomain-containing (i.e., PHD, BROMO, PWWP, and MYND) protein, has been shown to selectively recognizes histone variant H3.3 lysine 36 trimethylation (H3.3K36me3), modulates RNA Polymerase II elongation, and functions as RNA splicing regulator. Mutations in MGA or BS69 have been linked to multiple cancers or neural developmental disorders. Here, by TALEN and CRISPR/Cas9-mediated loss of gene function assays, we show that zebrafish Mga and Bs69 are required to maintain proper Bmp signaling during early embryogenesis. We found that Mga protein localized in the cytoplasm modulates Bmpr1a activity by physical association with Zmynd11/Bs69. The Mynd domain of Bs69 specifically binds the kinase domain of Bmpr1a and interferes with its phosphorylation and activation of Smad1/5/8. Mga acts to antagonize Bs69 and facilitate the Bmp signaling pathway by disrupting the Bs69-Bmpr1a association. Functionally, Bmp signaling under control of Mga and Bs69 is required for properly specifying the ventral tailfin cell fate.</p

    P3OP^{3}O: Transferring Visual Representations for Reinforcement Learning via Prompting

    Full text link
    It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization (P3OP^{3}O), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of P3OP^{3}O consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement P3OP^{3}O and evaluate it on the OpenAI CarRacing video game. The experimental results show that P3OP^{3}O outperforms the state-of-the-art visual transferring schemes. In particular, P3OP^{3}O allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.Comment: This paper has been accepted to be presented at the upcoming IEEE International Conference on Multimedia & Expo (ICME) in 202

    Managing the Mobility of a Mobile Sensor Network Using Network Dynamics

    Full text link

    Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

    Full text link
    LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from that Radar data are extremely sparse and lack height information. Therefore, directly integrating Radar features into LiDAR-centric detection networks is not optimal. In this work, we introduce a bi-directional LiDAR-Radar fusion framework, termed Bi-LRFusion, to tackle the challenges and improve 3D detection for dynamic objects. Technically, Bi-LRFusion involves two steps: first, it enriches Radar's local features by learning important details from the LiDAR branch to alleviate the problems caused by the absence of height information and extreme sparsity; second, it combines LiDAR features with the enhanced Radar features in a unified bird's-eye-view representation. We conduct extensive experiments on nuScenes and ORR datasets, and show that our Bi-LRFusion achieves state-of-the-art performance for detecting dynamic objects. Notably, Radar data in these two datasets have different formats, which demonstrates the generalizability of our method. Codes are available at https://github.com/JessieW0806/BiLRFusion.Comment: accepted by CVPR202
    corecore